995 resultados para Scientific Cloud


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Cloud computing is establishing itself as the latest computing paradigm in recent years. As doing science in the cloud is becoming a reality, scientists are now able to access public cloud centers and employ high-performance computing resources to run scientific applications. However, due to the dynamic nature of the cloud environment, the usability of scientific cloud workflow systems can be significantly deteriorated if without effective service quality assurance strategies. Specifically, workflow temporal verification as the major approach for workflow temporal QoS (Quality of Service) assurance plays a critical role in the on-time completion of large-scale scientific workflows. Great efforts have been dedicated to the area of workflow temporal verification in recent years and it is high time that we should define the key research issues for scientific cloud workflows in order to keep our research on the right track. In this paper, we systematically investigate this problem and present four key research issues based on the introduction of a generic temporal verification framework. Meanwhile, state-of-the-art solutions for each research issue and open challenges are also presented. Finally, SwinDeW-V, an ongoing research project on temporal verification as part of our SwinDeW-C cloud workflow system, is also demonstrated.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Many scientific workflows are data intensive where large volumes of intermediate data are generated during their execution. Some valuable intermediate data need to be stored for sharing or reuse. Traditionally, they are selectively stored according to the system storage capacity, determined manually. As doing science in the cloud has become popular nowadays, more intermediate data can be stored in scientific cloud workflows based on a pay-for-use model. In this paper, we build an intermediate data dependency graph (IDG) from the data provenance in scientific workflows. With the IDG, deleted intermediate data can be regenerated, and as such we develop a novel intermediate data storage strategy that can reduce the cost of scientific cloud workflow systems by automatically storing appropriate intermediate data sets with one cloud service provider. The strategy has significant research merits, i.e. it achieves a cost-effective trade-off of computation cost and storage cost and is not strongly impacted by the forecasting inaccuracy of data sets' usages. Meanwhile, the strategy also takes the users' tolerance of data accessing delay into consideration. We utilize Amazon's cost model and apply the strategy to general random as well as specific astrophysics pulsar searching scientific workflows for evaluation. The results show that our strategy can reduce the overall cost of scientific cloud workflow execution significantly.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Paper presented at the Cloud Forward Conference 2015, October 6th-8th, Pisa

Relevância:

70.00% 70.00%

Publicador:

Resumo:

On-time completion is an important temporal QoS (Quality of Service) dimension and one of the fundamental requirements for high-confidence workflow systems. In recent years, a workflow temporal verification framework, which generally consists of temporal constraint setting, temporal checkpoint selection, temporal verification, and temporal violation handling, has been the major approach for the high temporal QoS assurance of workflow systems. Among them, effective temporal checkpoint selection, which aims to timely detect intermediate temporal violations along workflow execution plays a critical role. Therefore, temporal checkpoint selection has been a major topic and has attracted significant efforts. In this paper, we will present an overview of work-flow temporal checkpoint selection for temporal verification. Specifically, we will first introduce the throughput based and response-time based temporal consistency models for business and scientific cloud workflow systems, respectively. Then the corresponding benchmarking checkpoint selection strategies that satisfy the property of “necessity and sufficiency” are presented. We also provide experimental results to demonstrate the effectiveness of our checkpoint selection strategies, and finally points out some possible future issues in this research area.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Cloud computing enables independent end users and applications to share data and pooled resources, possibly located in geographically distributed Data Centers, in a fully transparent way. This need is particularly felt by scientific applications to exploit distributed resources in efficient and scalable way for the processing of big amount of data. This paper proposes an open so- lution to deploy a Platform as a service (PaaS) over a set of multi- site data centers by applying open source virtualization tools to facilitate operation among virtual machines while optimizing the usage of distributed resources. An experimental testbed is set up in Openstack environment to obtain evaluations with different types of TCP sample connections to demonstrate the functionality of the proposed solution and to obtain throughput measurements in relation to relevant design parameters.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

How can applications be deployed on the cloud to achieve maximum performance? This question is challenging to address with the availability of a wide variety of cloud Virtual Machines (VMs) with different performance capabilities. The research reported in this paper addresses the above question by proposing a six step benchmarking methodology in which a user provides a set of weights that indicate how important memory, local communication, computation and storage related operations are to an application. The user can either provide a set of four abstract weights or eight fine grain weights based on the knowledge of the application. The weights along with benchmarking data collected from the cloud are used to generate a set of two rankings - one based only on the performance of the VMs and the other takes both performance and costs into account. The rankings are validated on three case study applications using two validation techniques. The case studies on a set of experimental VMs highlight that maximum performance can be achieved by the three top ranked VMs and maximum performance in a cost-effective manner is achieved by at least one of the top three ranked VMs produced by the methodology.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Massive computation power and storage capacity of cloud computing systems enable users to either store large generated scientific datasets in the cloud or delete and then regenerate them whenever reused. Due to the pay-as-you-go model, the more datasets we store, the more storage cost we need to pay, alternatively, we can delete some generated datasets to save the storage cost but more computation cost is incurred for regeneration whenever the datasets are reused. Hence, there should exist a trade-off between computation and storage in the cloud, where different storage strategies lead to different total costs. The minimum cost, which reflects the best trade-off, is an important benchmark for evaluating the cost-effectiveness of different storage strategies. However, the current benchmarking approach is neither efficient nor practical to be applied on the fly at runtime. In this paper, we propose a novel Partitioned Solution Space based approach with efficient algorithms for dynamic yet practical on-the-fly minimum cost benchmarking of storing generated datasets in the cloud. In this approach, we pre-calculate all the possible minimum cost storage strategies and save them in different partitioned solution spaces. The minimum cost storage strategy represents the minimum cost benchmark, and whenever the datasets storage cost changes at runtime in the cloud (e.g. new datasets are generated and/or existing datasets' usage frequencies are changed), our algorithms can efficiently retrieve the current minimum cost storage strategy from the partitioned solution space and update the benchmark. By dynamically keeping the benchmark updated, our approach can be practically utilised on the fly at runtime in the cloud, based on which the minimum cost benchmark can be either proactively reported or instantly responded upon request. Case studies and experimental results based on Amazon cloud show the efficiency, scalability and practicality of our approach.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Scientific workflow is a complicated data intensive application. How to achieve an effective data placement schema in hybrid cloud environment has become a crucial issue nowadays, especially with the new challenges brought by the security issues. Traditional data placement strategies usually adopt load balancing-based partition model to allocate datasets. Although these data placement schemas can have good performance in load balancing, their data transfer time may not be optimal. In contrast to traditional strategies, this paper focuses on the hybrid cloud environment and proposes a data dependency destruction-based partition model to achieve the minimal data dependency destruction partition. In addition, it presents a novel datacenter-oriented data placement strategy. This strategy allocates high dependency datasets to one datacenter according to the new partition model and thus significantly reduces data transfer time between datacenters. Experimental results show that the proposed strategy can effectively reduce data transfer time during workflow's execution.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Recent advances in hardware development coupled with the rapid adoption and broad applicability of cloud computing have introduced widespread heterogeneity in data centers, significantly complicating the management of cloud applications and data center resources. This paper presents the CACTOS approach to cloud infrastructure automation and optimization, which addresses heterogeneity through a combination of in-depth analysis of application behavior with insights from commercial cloud providers. The aim of the approach is threefold: to model applications and data center resources, to simulate applications and resources for planning and operation, and to optimize application deployment and resource use in an autonomic manner. The approach is based on case studies from the areas of business analytics, enterprise applications, and scientific computing.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

With the availability of a wide range of cloud Virtual Machines (VMs) it is difficult to determine which VMs can maximise the performance of an application. Benchmarking is commonly used to this end for capturing the performance of VMs. Most cloud benchmarking techniques are typically heavyweight - time consuming processes which have to benchmark the entire VM in order to obtain accurate benchmark data. Such benchmarks cannot be used in real-time on the cloud and incur extra costs even before an application is deployed.

In this paper, we present lightweight cloud benchmarking techniques that execute quickly and can be used in near real-time on the cloud. The exploration of lightweight benchmarking techniques are facilitated by the development of DocLite - Docker Container-based Lightweight Benchmarking. DocLite is built on the Docker container technology which allows a user-defined portion (such as memory size and the number of CPU cores) of the VM to be benchmarked. DocLite operates in two modes, in the first mode, containers are used to benchmark a small portion of the VM to generate performance ranks. In the second mode, historic benchmark data is used along with the first mode as a hybrid to generate VM ranks. The generated ranks are evaluated against three scientific high-performance computing applications. The proposed techniques are up to 91 times faster than a heavyweight technique which benchmarks the entire VM. It is observed that the first mode can generate ranks with over 90% and 86% accuracy for sequential and parallel execution of an application. The hybrid mode improves the correlation slightly but the first mode is sufficient for benchmarking cloud VMs.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

When orchestrating Web service workflows, the geographical placement of the orchestration engine (s) can greatly affect workflow performance. Data may have to be transferred across long geographical distances, which in turn increases execution time and degrades the overall performance of a workflow. In this paper, we present a framework that, given a DAG-based workflow specification, computes the optimal Amazon EC2 cloud regions to deploy the orchestration engines and execute a workflow. The framework incorporates a constraint model that solves the workflow deployment problem, which is generated using an automated constraint modelling system. The feasibility of the framework is evaluated by executing different sample workflows representative of scientific workloads. The experimental results indicate that the framework reduces the workflow execution time and provides a speed up of 1.3x-2.5x over centralised approaches.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Thesis (Master's)--University of Washington, 2015

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Cloud computing is increasingly being adopted in different scenarios, like social networking, business applications, scientific experiments, etc. Relying in virtualization technology, the construction of these computing environments targets improvements in the infrastructure, such as power-efficiency and fulfillment of users’ SLA specifications. The methodology usually applied is packing all the virtual machines on the proper physical servers. However, failure occurrences in these networked computing systems can induce substantial negative impact on system performance, deviating the system from ours initial objectives. In this work, we propose adapted algorithms to dynamically map virtual machines to physical hosts, in order to improve cloud infrastructure power-efficiency, with low impact on users’ required performance. Our decision making algorithms leverage proactive fault-tolerance techniques to deal with systems failures, allied with virtual machine technology to share nodes resources in an accurately and controlled manner. The results indicate that our algorithms perform better targeting power-efficiency and SLA fulfillment, in face of cloud infrastructure failures.